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Abstract. Special scattered subwords, in which the gaps are of length 
from a given set, are defined. The scattered subword complexity, which is 
the number of such scattered subwords, is computed for rainbow words. 

1 Introduction 

Sequences of characters called words or strings are widely studied in combi- 
natorics, and used in various fields of sciences (e.g. chemistry, physics, social 
sciences, biology [2, 3, 4, 11] etc.). The elements of a word are called letters. 
A contiguous part of a word (obtained by erasing a prefix or / and a suffix) is a 
subword or factor. If we erase arbitrary letters from a word, what is obtained 
is a scattered subword. Special scattered subwords, in which the consecutive 
letters are at distance at most d (d > 1) in the original word, are called 
d-subwords [7, 8]. In [9] the super- d- subword is defined, in which case the 
distances are of length at least d. The super- d-complexity, as the number of 
such subwords, is computed for rainbow words (words with pairwise different 
letters). 

In this paper we define special scattered subwords, for which the distance 
in the original word of length n between two letters which will be consecutive 
in the subword, is taken from a subset of {1 , 2, . . . , n — 1}. 

The complexity of a word is defined as the number of all its different sub- 
words. Similar definitions are for d-complexity, super-d- complexity and scat- 
tered subword complexity. 
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The scattered subword complexity is computed in the special case of rainbow 
words. The idea of using scattered words with gaps of length between two given 
values is from Jozsef Bukor [1]. 

Another point of view of scattered complexity in the case of non-primitive 
words is given is [5]. 

2 Definitions 

Let L be an alphabet, L n , as usually, the set of all words of length n over L, 
and L* the set of all finite word over L. 

Definition 1 Let n and s be positive integers, M C {1 , 2, . . . , n — 1 } and u = 
xiX2 . . . x n € L n . An M-subword of length s o/u is defined asv = xj, x\ 2 ... Xi s 
where 
ii > 1, 

- ij € M for ) = 1 , 2, . . . , s - 1 , 
i s < ri. 

Definition 2 The number of M-subwords of a word u for a given set M is 
the scattered subword complexity, simply M- complexity. 

The M-subword in the case of M = {1 , 2, . . . , d} is the d-subword defined in 
[7], while in the case of M = {d, d + 1 , . . . , n — 1} is the super- d- complexity 
defined in [9]. 

Examples. The word abed has 11 {1 , 3}-subwords: a, ab, abc, abed, ad, b, 
be, bed, c, cd, d. The {2, 3 . . . , n — l}-subwords of the word abedef are the 
following: a, ac, ad, ae, af, ace, acf, adf, b, bd, be, bf, bdf, c, ce, cf, d, 
df, e, f. 

Hereinafter instead of {di , di + 1 , . . . , d.2 — 1 , d2}-subword we will use the 
simple notation (di , d2)-subword. 

3 Computing the scattered complexity for rainbow 
words 

Words with pairwise different letters are called rainbow words. The M-comple- 
xity of a rainbow word of length n does not depend on what letters it contains, 
and is denoted by K(n, M). 
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Let us recall two results for special scattered words, as d-subwords and 
super- d-subwords. 

For a rainbow word of length n the super- d-compexity [9] is equal to 

K(u,{d,d+1,...,u-l})=^( n - (d k - 1)k ), (1) 



k>0 



and the (n — d)-complexity [8] is 

K(n,{1,2,...,n-d}) = T - (d - 2) • 2 d_1 -2, for n > 2d - 2. 
For special cases the following propositions can be easily proved. 
Proposition 3 For n, di < d 2 positive integers 

K(n, W , , a, + 1 ,. . . , d 2} ) < n + £ (»" [ d | 7 1 - £ ( • 

k>l v 7 k>l v 7 

Proof. This can be obtained from (1) and the formula 

K(u,{d b di +1,...,d 2 }) < K(n,{di,d!+1,...,n-1}) 

- K(n,{d 2 + l,d 2 +2,...,n- 1}) +n. 

□ 

For example, K(7, {2, 3, 4, 5, 6}) = 33, K(7, {4, 5, 6}) = 1 3, and from the propo- 
sition K(7, {2, 3}) < 27. The exact value is K(7, {2, 3}) = 25, the two words acg 
and aeg are not eliminated (here the original distances are 2 and 4 in acg, 
and 4 and 2 in aeg). 

Proposition 4 For the integers n, d > 1 , where n = hd + m 

Qx+IXn + m) 
K(n,{d}) = . 

Proof. , 

n— d 

K(n,{d}) = n+Y =n+d(l + 2 + . . . + h - 1 ) + mh 

z — d 

i=1 L J 



, dh(h-l) (h+1)(n + m) 
n H 1- ma = 



□ 
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Figure 1: Graph for (2, n — 1)-subwords when n = 6. 



To compute the M-complexity of a rainbow word of length n we will use 
graph theoretical results. Let us consider the rainbow word ai ai . . . a n and 
the correspondig digraph G = (V, E), with 

V= {ai,a 2 ,...,a n }, 

E = {(di,CLj) I j -i e M, i = 1,2, ...,n,j = 1,2, ...,n}. 
For n = 6, M = {2, 3, 4, 5} see Figure 1. 

The adjacency matrix A = (iij) i= y^ - = — of the graph is defined by: 

/ 1, if j - i € M, . . 

0, otherwise, for t = 1,2,...,^ = 1,2,...^. 



Because the graph has no directed cycles, the entry in row i and column j in 
A k (where A k = A k_1 A, with A 1 = A) will represent the number of directed 
paths of length k from to aj . If I is the identity matrix (with entries equal 
to 1 only on the first diagonal, and otherwise), let us define the matrix 
R=(t«1: 



R = I + A + A 2 H h A k , where A k+1 = O (the null matrix) 

The M-complexity of a rainbow word is then 



K(n, M) =Y_1L T 

i=1 j=1 



n n 



Matrix R can be better computed using a variant of the well-known Warshall 
algorithm (for the original Warshall algorithm see for example [12]): 
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Warshall(A,u) 

1 W^A 

2 for k <— 1 to n 

3 do for i <— 1 ton 

4 do for j <— 1 to n 

5 do Wij <— + WikWty 

6 return W 



From W we obtain easily R = I + W. 

For example let us consider the graph in Figure 1. The corresponding adja- 
cency matrix is: 



A 



V 



\ 



After applying the Warshall algorithm: 



/ 1 1 2 3 \ 

1 1 2 

1 1 

1 



Voooooo/ 



w 



{\ 
1 





1 2 
1 1 





1 




3 \ 
2 

1 1 

1 

1 



V 1 / 



and then K (6, {2, 3, 4, 5}) = 20, the sum of elements in R. 

The Warshall algorithm combined with the Latin square method can be 
used to obtain all nontrivial (with length at least 2) M-subwords of a given 
rainbow word d\0.2 ■ ■ ■ a n . Let us consider a matrix A with the entries A 
which are set of words. Initially this matrix is defined as: 



A,- 



{cuci}}, ifj-ieM, 
0, otherwise, 



for i = 1 , 2, . . . , u, j = 1 , 2, . . . , u. 



If A and B are sets of words, AB will be formed by the set of concatenation 
of each word from A with each word from B: 



AB = {ob\aeA,beB}. 
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If s = si S2 • • • Sp is a word, let us denote by 's the word obtained from s by 
erasing the first character: 's = S2S3 ■ ■ ■ s p . Let us denote by 'Ay the set Ay 
in which we erase the first character from each element. In this case 'A is a 
matrix with entries 'A^. 

Starting with the matrix A defined as before, the algorithm to obtain all 
nontrivial M-subwords is the following: 

Warshall-Latin(.A, n) 

1 W^A 

2 for k <— 1 to n 

3 do for i <— 1 to n 

4 do for j 1 to n 

5 do if W ik ^ 



and W k j 7^ 



then Wy <- WyUWac'Wki 



7 return W 



The set of nontrivial M-subwords is 



u 



i,j6{1,2,...,n} 

For n = 8, M = {3, 4, 5, 6, 7} the initial matrix is: 

/ {ad} {ae} {af} {ag} {ah} \ 

{be} {bf} {bg} {bh} 

{cf} {eg} {eh} 

{dg} {dh} 

{eh} 



\0000 0/ 
The result of the algorithm Warsh all- Latin in this case is: 



{ad} {ae} {af} {ag, adg} {ah, adh, aeh} \ 







\ 



{be} {bf} 


{bg} 


{bh, beh} 


{cf} 


{eg} 


{ch} 





{dg} 


{dh} 








{eh} 






























The algorithm Warshall-Latin can be used for nonrainbow words too, 
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with the remark that repeating subwords must be eliminated. For the word 
aabbbaaa and M = {3,4,5,6,7} the result is: aa, ab, aba, ba. 

4 Computing the (di , (^-complexity 

Let us denote by a^ the number of (di , d2)-subwords which terminate at posi- 
tion i in a rainbow word of length n. Then 

at = 1 + ai_ d , + ai_ d ,_! H h ai_ d2 , (2) 

with the remark that for i < we have a^ = 0. Subtracting a^_i from a^ we 
get the following simpler equation. 

cii = ai_! + ai_ d , - ai_!_ d2 . 

The (di , dz)- complexity of a rainbow word of length n is 

K(n,{d 1 ,d 1 +1,...,d 2 })=£a l ( 3 ) 

i=l 

For example, if di = 2, d.2 = 4, the following values are obtained 



u 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


an 


1 


1 


2 


3 


5 


7 


11 


16 


24 


35 


52 


76 


112 


K(n,{2,3,4}) 


1 


2 


4 


7 


12 


19 


30 


46 


70 


105 


157 


233 


345 



If we denote by A(z) = a n z n the generating function of the sequence 

n>1 

a n , then from (2) we obtain 

n>l n>l n>l n>1 

and 

A(z) = + z d ' A(z) + ■ ■ ■ + z d ' A(z). 
1 — z 

From this we obtain 

A(z) = z^i-z^-z + r (4) 
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For di = 2, d-2 = 4 the sequence (a n ) n >o ([10] sequence A023435) corre- 
sponds to a variant of the dying rabbits problem [6]. 

To compute the generating function for the complexity K(n, {di , di + 1 , . . . , 
d2}) , let us denote this complexity simply by K n only, and its generating 
function by K(z) = K n z n . We remark that K n = for n < 0, and Ki = 1 . 

n>l 

From (3) and (4) we can immediately conclude that 
Kfzl = ^— Afz) = 



r)( z d 2 + l _ z d, —Z+V 



5 Correspondence between (d, n + d — 1 )-subwords 
and {1 , d}-subwords 

The following result is inspired from the sequence A050228 1 of [10]. 

Proposition 5 The number of {1 , d}-subwords of a rainbow word of length n 
is equal to the number of {d, d + 1 , . . . , n + d — 1 }-subwords of length at least 
2 of a rainbow word of length n + d. 

Proof. By the generalization of the sequence A050228 [10] the number of the 
{1 , d}-subwords of a rainbow word of length n is equal to 



k>0 

From (1) we have 



W n a\\ V Ai+1-(d-1)k\ 



K(n + d,{d,d+1,...,u+d-l})-(u + d)=^( n + d k ^ 1)k ). 

k>l ^ ' 

v- /n+ 1 - (d- 1)k\ 
By changing k to k + 1 in the sum, we obtain 2_ I , )i an d 



k + 2 

k>0 v 

this proves the theorem. □ 
Example. For abcde the 19 {1 , 3}-subwords are: 

a, b, c, d, e, ab, abc, abed, ad, ade, abcde, abe, be, bed, bede, be, cd, cde, de. 

For abedefgh the 19 {3,4,5, 6, 7}-subwords of length at least 2 are: 
ad, ae, af, ag, adg, ah, adh, aeh, be, bf, bg, bh, beh, cf, eg, eh, dg, dh, eh. 



1 A050228: a n is the number of subsequences {Sk} of {1,2,3, ...n} such that Sk+i — Sk is 1 
or 3. 
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Conclusions 

A special scattered subword, the so-called M-subword is denned, in which the 
distances (gaps) between letters are from the set M. The number of the M- 
subwords of a given word is the M-complexity. Graph algorithms are used 
to compute the M-complexity and to determine all M-subwords of a rainbow 
word. This notion of M-complexity is a generalization of the d-complexity 
[7] and of the super- d-complexity [9]. If M consists of successive numbers 
from di to dz then the so-called (di, dz) -complexity is computed by recursive 
equations and generating functions. 
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